Sarah Strochak, Kyle Ueyama, Aaron R. Williams
What is 2 + 2?
What is 2 + 2?
## [1] 4
What is the median price of diamonds with carat > 1 and a Good cut?
What is the median price of diamonds with carat > 1 and a Good cut?
## # A tibble: 1 x 1
## `median(price)`
## <int>
## 1 6412
How could increasing the retirement age affect the poverty rates of Hispanic women ages 62 and older?
How could increasing the retirement age affect the poverty rates of Hispanic women ages 62 and older?
scale
maps
documents
Deliberate steps should be taken to minimize the chance of making an error and maximize the chance of catching errors when errors inevitably occur.
Computational reproducibility should be embraced to improve accuracy, promote transparency, and prove the quality of analytical work.
Code should be written so humans can easily understand what’s happening—even if it occasionally sacrifices machine performance.
Analyses should be designed so strangers can understand each and every step without additional instruction or inquiry from the original analyst.
Research and data are non-rival and non-exclusive. They are public goods that should be widely and easily shared. Decisions about tools, methods, data, and language during the research process should be made in ways that promote the ability of anyone and everyone to access an analysis.
Analysts should seek to make all parts of the research process more efficient with clear communication, by adopting best practices, and by managing computation.
.R and .Rmd“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” ~ Hadley Wickham
Collections of R, C, C++, and FORTRAN code that expand the functionality of R.
The Comprehensive R Archive Network was introduced in 1997.
Repository of popular R packages with basic standards and quality control.
Comprehensive set of tools for data science
Core: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats
Free text by Hadley Wickham and Garrett Grolemund
Scalars (do not exist in R)
Vectors
## [1] 1 2 3 4 5
Matrices
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Data frames, multidimensional arrays
## # A tibble: 4 x 4
## name awake brainwt bodywt
## <chr> <dbl> <dbl> <dbl>
## 1 Cheetah 11.9 NA 50
## 2 Owl monkey 7 0.0155 0.48
## 3 Mountain beaver 9.6 NA 1.35
## 4 Greater short-tailed shrew 9.1 0.00029 0.019
Character
## [1] "a" "b" "c" "d" "e"
Numeric
## [1] 1 2 3 4 5
Logical
## [1] TRUE TRUE FALSE TRUE FALSE
Factor
## [1] good ok bad ok ok
## Levels: good ok bad
A great strength of R!
NA is R’s encoding for missing values.
Missing values are contagious.
## [1] NA
R can hold many different objects at the same time. This requires assignment.
<-
## [1] 4
## [1] 4
Arguments by position
## [1] 2.5
Arguments by name
## [1] 2.5
Function documentation ?mean
Rule of three: never program something three or more times
## [1] "odd!" "even!" "odd!" "even!" "odd!" "even!" "odd!" "even!"
## [9] "odd!" "even!"
What will it take to convince you that your code is correct?
Are values that must be positive non-positive?
Write the test first!
Each time you encounter a bug, write a test that will convince you the bug no longer exists.
1: use it, use it again, use it some more.
Photo by StataCorp LP, CC BY-SA 4.0, Unaltered
Source is unknown
Comments